NVIDIA Open Sources OmniVinci All-Modal Understanding Model with Only 1/6 of the Training Data
NVIDIA released the OmniVinci all-modal understanding model, leading top models by 19.05 points in multiple benchmark tests. The model uses only 0.2 trillion training tokens, achieving six times the data efficiency of competitors. It aims to achieve unified understanding of vision, audio, and text, advancing machine multimodal cognitive capabilities.